Assessment and management system for rehabilitative conditions and related methods

ABSTRACT

Systems and methods are disclosed for the measurement of patient outcomes in a rehabilitation setting. In one exemplary method, an assessment relating to self-care is provided, an assessment relating to mobility is provided, and an assessment relating to cognition is provided, wherein the assessments have been pre-selected using item response theory.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. patent application Ser. No. 16/142,313, filed Sep. 26, 2018, now U.S. Pat. No. 11,380,425, which claims the priority benefit of U.S. Provisional Patent Application 62/563,960, filed Sep. 27, 2017, both of which are incorporated herein by reference in their entirety.

TECHNICAL FIELD

The present disclosure relates generally rehabilitative techniques and, more particularly, to a computer-assisted method for assessing a patient.

BACKGROUND

An “outcomes measure,” also known as an “outcomes assessment tool,” is a series of items used to determine varying medical conditions or functional status of a patient. One outcomes measure is the Functional Independence Measure (FIM®), which provides a method of measuring functional status. The assessment contains eighteen items composed of motor tasks (13 items) and cognitive tasks (5 items). Tasks are rated by a clinician on a seven-point ordinal scale that ranges from total assistance to complete independence. Scores range from 7 (lowest) to 91 (highest) for motor skills and 7 to 35 for cognition skills. Items include eating, grooming, bathing, upper body dressing, lower body dressing, toileting, bladder management, bowel management, bed to chair transfer, toilet transfer, shower transfer, locomotion (ambulatory or wheelchair level), stairs, cognitive comprehension, expression, social interaction, problem solving, and memory.

The FIM measure uses a scoring criteria that ranges from a score of 1 (which reflects total assistance) to a score of 7 (which reflects complete independence). A score of 7 is intended to reflect that a patient has complete independence. A score of 1 is intended to reflect that a patient can perform less than 25% of the task or requires more than one person to assist. As a result of this scoring system, many patients who make improvements in a free-standing inpatient rehabilitation facility or an inpatient rehabilitation unit within a hospital do not necessarily register gains in their outcomes score during their rehabilitation. For instance, a spinal cord injury patient may make significant improvements to fine finger skill motor skills during rehabilitation, allowing the patient to use a computer or a smart phone. However, his or her FIM score in this situation would not improve.

An outcomes measure is needed that more accurately captures assessment of a patient's medical condition or functional status. Additionally, an outcomes measure is needed that helps to better identify areas in which patients, such as rehabilitation patients, can improve.

An “item” is a question or other kind of assessment used in an outcomes measure. For example, one item on an outcomes measure known as the Berg Balance Scale instructs a patient as follows: “Please stand up. Try not to use your hands for support.” A “rating” is a score outcome or other evaluation in response to an item assessment. For example, the ratings for the Berg Balance Scale item are as follows: a rating of 4, which reflects that the patient is able to stand without using her hands and stabilize independently; a rating of 3, which reflects that the patient is able to stand independently using her hands; a rating of 2, which reflects that the patient is able to stand using her hands after several tries; a rating of 1, which reflects that the patient needs minimal aid from another to stand or to stabilize; and a rating of 0, which reflects that the patient needs moderate or maximal assistance from another to stand.

Classical test theory is a body of related psychometric theory that predicts outcomes of educational assessment and psychological testing such as the difficulty of items or the ability of test-takers. It is a theory of testing based on the idea that a person's observed or obtained score on a test is the sum of a true score (error-free score) and an error score. Classical test theory assumes that each person has a true score, T, that would be obtained if there were no errors in measurement. A person's true score is defined as the expected number-correct score over an infinite number of independent administrations of the test. Unfortunately, test users never observe a person's true score, only an observed score, X. It is assumed that observed score=true score plus some error, or X=T+E, where X is the observed score, T is the true score, and E is the error. The reliability, i.e., the overall consistency of a measure, of the observed test score X is defined as the ratio of true score variance to the observed score variance. Because the variance of the observed scores can be shown to equal the sum of the variance of true scores and the variance of error scores, this formulates a signal-to-noise ratio wherein reliability of test scores becomes higher as the proportion of error variance in the test scores becomes lower and vice versa. The reliability is equal to the proportion of the variance in the test scores that could be explained if the true scores were known. The square root of the reliability is the correlation between true and observed scores. Estimates of reliability can be obtained by various means, such as the parallel test or a measure of internal consistency known as Cronbach's coefficient α. Cronbach's α can be shown to provide a lower bound for reliability, and thus, the reliability of test scores in a population is always higher than the value of Cronbach's α in that population.

SUMMARY

The problem of accurately measuring improvements in rehabilitation patients is solved by developing an outcomes assessment that incorporates factor analysis and item response theory.

The problem of measuring improvements in rehabilitation patients is solved by asking a series of questions to the patient and returning a domain-specific and/or composite score.

The problem of improving care in rehabilitation patients is solved by predicting the domain-specific and/or composite score on an outcomes measurement of the patient and providing a clinical intervention if the domain-specific and/or composite score falls below the predicted score.

DRAWINGS

While the appended claims set forth the features of the present techniques with particularity, these techniques, together with their objects and advantages, may be best understood from the following detailed description taken in conjunction with the accompanying drawings of which:

FIG. 1 displays a flowchart for certain exemplary methods for preparing a preliminary outcomes measure;

FIG. 2 displays a flowchart for electronically collecting ratings for the items in the preliminary outcomes measure;

FIG. 3 displays an exemplary scoring system of the IRT, comparing it against a FIM® score known in the prior art;

FIG. 4 displays an exemplary plot of certain data relating to patient scores in the Self Care, Cognition, and Mobility domains;

FIG. 5 further shows a patient's current and expected functional status on each of the items/tasks in the Self Care domain;

FIG. 6A and FIG. 6B are an “FIM Explorer” section, with features to allow clinicians to select and/or set the goal for each FIM-specific task;

FIG. 7 shows a comparison chart; and

FIG. 8 displays various plots for the Self Care domain in comparison with a patient's FIM score.

DETAILED DESCRIPTION

A “bifactor model” is a structural model wherein items cluster onto a specific factors while at the same time loading onto a general factor.

The term “categorical” is used to describe response options for which there is no explicit or implied order or ranking.

A “Comparative Fit Index” (CFI) compares the performance of the constructed structural model against the performance of a model that postulates no relationships between variables. A good-fitting model generally has a CFI greater than 0.95.

A “complex structure” is a CFA structural model where at least one item loads onto more than one factor.

“Confirmatory factor analysis” (CFA) is a form of factor analysis utilized where a psychometrician has an understanding of how the latent traits and items should be grouped and related. A structural model is developed, and this is fit to the data. A goal of the model is to achieve a good fit with the data.

A “constraint” is a restriction imposed on a model for the sake of mathematical stability or the application of content area theory. For example, if two factors in a confirmatory factor analysis are not expected to have any relationship between them, a constraint on that correlation (requiring that it equal 0.00) can be added to the model.

A “continuous” variable is a variable that is measured without categories, like time, height, weight, etc.

A “covariate” is a variable in a model that is not on a measure, but may still have some explanatory power. For example, in rehabilitation research, it may occasionally be useful to include covariates for age, sex, length of stay, diagnostic group, etc.

“Dichotomous” describes response options that are ordinal with two categories (e.g., low versus high). Alternatively, it may refer to items that are scored correct vs. incorrect, which are conceptually also ordinal responses with two categories.

“Differential item functioning” (DIF), in item response theory, is a measure of how the parameter estimates may behave differently from group to group (in different samples) or from observation to observation (over time).

“Difficulty,” in item response theory, is the required minimum level of a latent trait that is necessary to respond in a certain way. On a measure with dichotomous responses, there is a single difficulty (e.g., the minimum level of the latent trait that will raise the probability of answering correctly to 50% or greater). On a measure with polytomous responses, “difficulty” is better described as “severity,” as there is not usually a correct or incorrect answer. On a measure with polytomous responses, the number of difficulties estimated is k−1, where k is the number of response options. These difficulties describe the level of latent trait necessary to endorse the next highest category. Sometimes also referred to as a threshold.

“Dimension” refers to the number of latent traits a measure addresses. A measure that records one trait is said to be unidimensional, while a measure recording more than one trait is referred to as multidimensional.

“Discrimination” is the ability of a test to differentiate between people with high versus low ability of the latent trait. Similarly, it describes the magnitude of the relationship between an item and a latent trait. Conceptually, it is very similar to a factor loading and mathematically, it can be converted into a factor loading.

“Endorse” means to select a response option.

“Equality constraint” in item response theory and confirmatory factor analysis, is a mathematical requirement to constrain the discriminations or factor loadings to be equal when only two items load onto a factor.

“Equating” refers to the use of item response theory to draw similarities between the scores on different measures that record the level of the same latent trait(s). Equating may also be used to compare alternate forms of the same measure.

“Error” refers to a term to describe the amount of uncertainty surrounding a model. A model with parameter estimates that are very close to the observed data will have low amounts of error, while one that is quite different would have a large amount of error. Error may also indicate the amount of uncertainty surrounding a specific parameter estimate itself.

“Estimation” refers to the statistical process of deriving parameter estimates from the data. These procedures may be performed using specialized psychometric software known in the art.

“Exploratory factor analysis” is a form of factor analysis that clusters items according to their correlations. This is often done without any direction from the analyst other than how many factors should be extracted. The groupings are then “rotated.” Rotation methods attempt to find factor loadings that are indicative of simple structure by making sure that factor loadings are pushed towards −1.00, 0.00, or 1.00.

“Factor” in factor analysis describes a latent trait. Unlike a latent trait in item response theory, factors do not normally have scores associated with them.

“Factor analysis” is a statistical method for determining the strength and direction of the relationships between factors and items. The data on which factor analysis is based are the correlations between items. Factor analysis can accommodate either ordinal or continuous data, but not unordered categorical. It is possible to compute scores from factor analysis, but IRT scores are more reliable. May either be exploratory or confirmatory.

“Factor correlation” refers to a correlation between two factors. A CFA model with correlated factors is called “oblique.”

“Factor loading,” in factor analysis, describes the magnitude of the relationship between an item and a factor. It is not mathematically the same as a correlation, though its scale and interpretation are similar. That is, values (usually) range from −1.00 to 1.00. A strong negative factor loading indicates a strong inverse relationship between an item and a latent trait, while a strong positive loading has the opposite interpretation. A factor loading of 0.00 indicates no relationship whatsoever.

“Fit statistics” or “fit index” refer to metrics used to quantify how well the model performs. Popular fit metrics confirmatory factor analysis and structural equation modeling include the root mean square error of approximation (RMSEA), the comparative fit index (CFI), the Tucker-Lewis Index (TLI), and the weighted root mean-square residual/standardized root mean-square residual (WRMR/SRMR).

“General factor,” in a bifactor model, refers to the factor onto which all items load.

“Graded response model” (GRM) is an extension of the two-parameter logistic model that allows for ordinal responses. Instead of only one difficulty, the graded response model yields k−1 difficulties, where k is the number of response categories.

“Hierarchical model” is a structural model where latent traits load onto other latent traits, forming a hierarchy.

“Higher-/lower-order factor,” in a hierarchical model, is a higher-order factor is a type of latent variable onto which lower-order factors load.

“Index” is a term used to refer to a fit index/statistic (e.g., comparative fit index) or as a synonym for “measure.”

“Item” refers to the questions, tasks, or ratings on a measure that are addressed by a respondent or a respondent's representative (such as a clinician).

“Item characteristic curve (ICC)” is a graph that plots the probability of selecting different response options given the level of a latent trait. Sometimes it also is called a “trace line.”

“Item response theory” (IRT) is a collection of statistical models used to obtain scores and determine item behavior according to a structural model. In one used form, IRT uses the response pattern of every person in the sample in order to get these item and score estimates. IRT uses data that are ordinal or categorical. Mathematically speaking, item response theory uses item and person characteristics in order to predict the probability that a person selects a certain response option on a given item.

“IRT score” is a score specific to IRT analysis that is given on a standardized scale. It is similar to a z-score. In one IRT scoring system, a score of 0.00 implies someone has an average level of a latent trait, a large negative score implies a low level of the latent trait, and a large positive value implies a large level of a latent trait.

“Latent trait” is similar to a factor in factor analysis, but used more often in item response theory. A latent trait is what a related set of items purports to measure. It may be used interchangeably with factor, domain, or dimension.

“Latent variable” is a term for a variable that is not measured directly. It includes latent traits.

“Linking” is similar to equating, but for item parameter estimates rather than scores.

“Load” is a verb used to describe what an item does on a factor. For example: “Item 4 loads onto the both the local dependence factor as well as the general factor in this model.”

“Local dependence” (LD) is a violation of the local independence assumption in which items are related for some reason other than the latent trait. If local dependence appears to exist in the data, it can be accounted for by either modeling a correlation between the items or by creating a local dependence factor. This can be due to a large number of reasons, such as similar wording, nearly identical content, and the location of the items on a measure (this last example occurs frequently on the last items of a long measure).

“Local independence” refers to an assumption in psychometrics that states that the behavior of items is due to the latent traits in the model and item-specific error and nothing else. When items violate this assumption, they are said to be locally dependent.

“Manifest variable” is a generic term for a variable that is measured directly, and includes items, covariates, and other such variables.

“Measure” or “measurement” refers to a collection of items that attempt to measure the level of some latent trait. It may be used interchangeably with assessment, test, questionnaire, index, or scale.

“Model” in psychometrics is a combination of the response model and the structural model. In general terms, it describes both the format of the data and how the data recorded in the model's variables should be related.

“Model fit” is a term used to describe how well a model describes the data. This may be done in a variety of ways, such as comparing the observed data against the predictions made by the model or comparing the chosen model against a null model (a model in which none of the variables are related). The metrics used to assess model fit are called fit statistics.

“Multidimensional” is a term used to describe a measure that records more than one latent trait.

“Multigroup analysis” in IRT refers to the process by which the sample can be split into different groups, and parameter estimates specific to each group may be estimated.

“Nominal model” is similar to the graded response model, but for items with response options that are categorical rather than ordinal.

“Oblique” is an adjective used to describe factors that are correlated.

“Ordinal” describes the way an item records data. For example, possible responses to an item are a series of categories ordered from low to high or high to low.

“Orthogonal” describes factors that are restricted to a zero correlation.

“Parameter estimate” is a statistically-derived value estimated by psychometric software. It is a generic term that may include things like item discriminations, factor loadings, or factor correlations.

“Path diagram” is a diagram meant to illustrate the relationships between items, latent traits, and covariates. In a path diagram, rectangles/squares represent observed variables (i.e., items, covariates, or any modeled variable for which there is explicitly recorded information), ovals/circles represent latent traits or variables for which there is no explicitly recorded information, one-headed arrows reflect one-directional relationships (as in a regression), and two-headed arrows reflect correlation/covariance between modeled variables.

“Polytomous” is a term for items with more than one response option, and may be either ordinal or categorical.

“Pseudobifactor model” is a bifactor model where not all items cluster onto specific factors. Instead, some items may only load onto the general factor.

“Psychometrician” is a kind of statistician that specializes in measurement.

“Psychometrics” describes the statistics used in creating or describing measures.

“Rasch model” is a response model that hypothesizes that all item discriminations are equal to 1.00. It usually is not used unless this assumption is true or nearly true. This assumption eases interpretation of scores and difficulties and allows use of item response theory on (relatively) small sample sizes, but it is very uncommon that all item discriminations behave identically. It is a simplified case of the two-parameter logistic model, which allows the item discriminations to vary. Because of this, the Rasch model is sometimes referred to as the one-parameter logistic model (1PL). It may be used when the responses are dichotomous.

A “respondent” is someone who answers items on a measurement.

A “response” is a respondent's answer to an item.

“Response categories” are the different options a respondent may select as a response to an item. If items yield dichotomous responses, the data are recorded as either correct (1) or incorrect (0).

“Response model,” in item response theory, refers to the way a measurement model handles the format of the responses. Popular response models include the Rasch model, the two-parameter logistic model, the three-parameter logistic model, the graded response model, and the nominal model.

A “response pattern” is a series of numbers representing a respondent's answers to each question on a measurement.

“Root mean square error of approximation” (RMSEA) is a fit statistic in applied psychometrics. It measures the closeness of the expected data (the data that the model would produce) against the observed data. It is usually desirable that the RMSEA is below 0.08, though some of ordinary skill in the art desire that the RMSEA be below 0.05.

A “score” is a numeric value meant to represent the level or amount of the latent trait a respondent possesses. Classical test theory computes scores as the sum of item responses, while item response theory estimates these using both response patterns and item qualities.

“Sigmoid” (literally, “S-shaped”) is an adjective is occasionally used to describe the shape of the TCC or the ICC of a 2PL item.

“Simple structure” is a structural model where all items load onto one factor at a time.

“Specific factor,” in a bifactor model, is a factor onto which a set of items load.

“Structural equation modeling” (SEM) is an extension of confirmatory factor analysis (CFA) that allows relationships between latent variables like latent traits. If all latent variables in the model are latent traits, structural equation modeling (SEM) and CFA are often used interchangeably.

“Structural model” is a mathematical description that represents a system of hypotheses regarding the relationships between latent traits and items. It is depicted as a path diagram.

“Sum score” is a score computed by summing the numeric value of all responses on a measure.

“Sum score conversion” (SSC) is a table that shows the relationship between the sum scores and an IRT scores.

“Test characteristic curve” (TCC) is a figure that plots the relationship between sum scores and IRT scores.

“Testlet” is a small collection of items that measure some component of the overall latent trait. Creating a measure comprised of testlets can lead to more easily interpreted scores when the definition of the latent trait is clearly defined beforehand.

“Threshold”: see “difficulty”.

“Tucker-Lewis Index” (TLI) is a fit index that compares the performance of the constructed model against the performance of a model that postulates no relationships between the variables. A good fitting model usually has a TLI of greater than 0.95.

“Three-parameter logistic model” (3PL) is an extension of the two-parameter logistic model that also includes a “guessing” parameter. For example, in a multiple-choice item with 4 choices, even guessing randomly results in a 25% chance of answering correctly. The 3PL allows for this non-zero chance of answering correctly. It is used when responses are dichotomous.

“Trace line”: see “item characteristic curve.”

“Two-parameter logistic model” (2PL) is like the Rasch Model, but allows item discriminations to vary. It may be used when item responses are dichotomous.

“Unidimensional” is a term used to describe a measure that records only one latent trait.

“Variable” is a generic word used to describe a set of directly (manifest) or indirectly (latent) recorded data that measures a single thing.

“Weighted root mean-square error/standardized root mean-square error” (WRMR/SRMR) is a fit statistic that measures the magnitude of a model's residuals. Residuals are the differences between the observed data and the data that the model predicts. The typical recommended WRMR value is below 1.00, though this recommendation may change based on size of the sample or complexity of the model. The WRMR is used when there is at least one categorical variable in the model, while the SRMR is used when all variables are continuous.

FIG. 1 displays a flowchart for certain exemplary methods for preparing a preliminary outcomes measure 100 for inclusion into an electronic medical record.

In 101, an item set 200 is identified. In an embodiment, clinicians may be queried to provide their input on appropriate items to include in the item set 200, based on their training, education, and experience. Examples of clinicians may include physicians, physical therapists, occupational therapists, speech language pathologists, nurses, and PCTs. Items from the item set 200 may come from a variety of outcomes measures known in the art.

In 102, the items from the item set 200 may be grouped into one or more of a plurality of areas, called “domains”, that are relevant to therapy or clinical outcomes. The clinicians may identify these domains. In an embodiment, items from the item set 200 may be grouped into three domains, titled “Self-Care”, “Mobility”, and “Cognition”. It should be understood that other groupings of additional and/or alternative domains are possible.

In 103, a related analysis step may occur. For instance, the frequency with which an item in the item set 200 is used in traditional practice to assess a patient in a medical setting may be analyzed. Alternately, the cost of equipment to conduct the item may be assessed. The clinical literature may be reviewed to identify the outcomes measures with items in the item set 200 that are psychometrically acceptable and clinically useful. For instance, the reliability and validity of an outcomes measure with one or more items in the item set 200 may be reviewed to ensure it is psychometrically acceptable. As another example, each outcomes measure and/or item may be reviewed to ensure it is clinically useful. For example, while there are many items used to test a person's balance that are available in the literature, not all of them are appropriate for patients in a rehabilitation context. Based on these and similar factors, the initial set of items may be narrowed to reduce the burden to patients, clinicians, and other health care providers.

In 104, a revised plurality of items is collected. A pilot study may be conducted on the plurality of items. The pilot study may be conducted by having clinicians assess patients on the revised items in a standardized fashion, such that each clinician assesses each patient using all of the revised items. In another embodiment, the clinicians may select which items should be used to assess a patient, based on the patient's particular clinical characteristics. The determination as to selection of specific items may be made based on information received during the patient's rehabilitation stay, for instance, at the inpatient evaluation at admission. The item may be administered at least twice during the patient's inpatient stay in order to determine patient progress. The pilot study may be facilitated using an electronic medical record system, such that clinicians enter item scores into the electronic medical record.

In 105, a pilot study analysis may be performed. For instance, items that take too much time for a clinician to conduct with a patient may be removed.

In 106, the original paper-based items for the preliminary outcomes measure 100 are implemented in an electronic medical record. Individual item-level rating can be recorded electronically. For instance, the items to be implemented may be the items that are the result of the pilot study analysis in 105. However, pilot study analysis is not required. Alternately, the items in the preliminary outcomes measure 100 could be implemented in an electronic system, such as a database, that is external to an electronic medical record. In one embodiment, the external electronic system may be in communication with the electronic medical record, using methods that are known in the art, such as database connection technologies. In 107, the items for the preliminary outcomes measure 100 are programmed into the EMR using known methods, allowing clinicians to input their ratings into the electronic medical record. In an embodiment, the EMR may provide a prompt to alert, remind, and/or require the clinician to enter certain ratings for certain items of the preliminary outcomes measure 100. Such prompts may improve the reliability and completeness of clinician data entry into the EMR.

Although the discussion above with reference to FIG. 1 refers to the selection of certain items from various outcome measures, it should be understood that a similar method may be conducted with respect to the selection of the outcome measures themselves. For example, in 104, instead of selecting which items to be used in assessing a patient, the entire outcome measure could be selected or disregarded.

FIG. 2 displays a flowchart for electronically collecting ratings for the items in the preliminary outcomes measure 100. In 201, a clinician conducts an assessment on a patient. In one embodiment, the clinician may conduct the assessment using every item in the preliminary outcomes measure 100. In another embodiment, the clinician may conduct those tests or items in the preliminary outcomes measure 100 that are specific to that clinician's scope of practice. For example, a physical therapist may conduct those tests or item in the preliminary outcomes measure 100 that are specific to physical therapy. In yet another embodiment, the clinician may use her clinical judgement, based on her education, training, and experience, to identify the tests in the preliminary outcomes measure 100 that are most relevant to the patient. If the patient is very ill or has very limited functioning, the clinician will know not to conduct certain items. For instance, a clinician would not ask a newly quadriplegic patient to perform a test that would require the patient to walk.

The assessment may be an initial assessment conducted at or shortly after the time of admission of the patient to a hospital. In an embodiment, each patient receiving care during a period of time, such as a month or a year, is assessed. In another embodiment, a majority of patients receiving care during a period of time are assessed. In yet another embodiment, a plurality of patients are assessed. In other embodiments, the patient population may be refined to include only inpatients, only outpatients, or a combination thereof.

In various embodiments, certain tests in the preliminary outcomes measure 100 may be conducted once at or shortly after admission, and again at or shortly prior to discharge. In various embodiments, certain tests in the preliminary outcomes measure 100 may be conducted weekly. In various embodiments, certain tests in the preliminary outcomes measure 100 may be conducted more than once per week, such as twice per week.

In an embodiment, the assessments may be conducted in a centralized location specific to conducting assessments. The assessments may be conducted by a set of clinicians whose specific function is to conduct assessments. A centralized location with qualified staff and adequate equipment to objectively assess a patient's functional performance may be conducted through a standardized process in a controlled and safe environment. In an embodiment, a clinician provides an order for a lab technician assessment. For example, a clinician (such as a physiatrist, therapist, nurse, or psychologist) orders a specific test (such as a test of gait and balance) or a group of tests. The test order may be sent electronically to the assessment department (“AAL”) and a hard copy may be printed for the patient. When the AAL is ready, the patient may travel to the AAL, with assistance if necessary. Staff, such as a technician, performs the ordered test(s). Test results may be recorded and entered/transmitted into the electronic medical record. The clinician may review test results to modify care plan if necessary. This process can reduce the amount of time clinicians require to learn how to conduct a test. One benefit of an AAL is that other clinicians do not need to learn how to conduct various tests every time a new test is introduced. Clinicians will only need to learn how to read the test results, not how to conduct the test. Qualified personnel with proper training can perform the tests. Clinical staff can focus on treatment rather than on assessment. More treatment sessions or additional time can be provided to improve outcomes. Test equipment is centrally kept to reduce the need for multiple units and maintenance costs. Tests can be conducted in a well-controlled, standardized and safe environment. The technician may utilize standardized procedures to avoid potential rater induced bias (tendency for higher ratings to show improvement over time), thus improving data quality.

The ratings from each assessment may be saved in the EMR. For instance, they may be saved in a preliminary ratings dataset 150. In 202, data analysis and cleanup may be performed on the preliminary ratings dataset 150 to improve data quality. For example, out-of-range ratings may be removed from the preliminary ratings dataset 150. Patterns of data in the preliminary ratings dataset 150 from the same clinician may be reviewed and cleaned using methods known in the art. Ratings in the preliminary ratings dataset 150 from patients that show a large increase in rating from “dependent” to “independent” may also be discarded. Suspect data from a particular evaluation may be discarded.

In 203, the ratings data may be further extracted, cleaned, and prepared using methods known in the art to get the data in a form in which the data may be queried and analyzed. Data may be reviewed for quality, and various data options, such as data pivoting, data merging, and creation of a data dictionary may be performed for the preliminary ratings dataset 150. Data from the preliminary ratings dataset 150 may be stored in the EMR or in a different form, such as in a data warehouse, for further analysis. It will be understood by one of ordinary skill in the art that many ways exist to structure the data in the preliminary ratings dataset 150 for analysis. In one embodiment, the preliminary ratings dataset 150 is structured so that item ratings are available for analysis across a plurality of dimensions, such as time period and patient identification.

Once the preliminary ratings dataset 150 has been prepared for analysis, a psychometric evaluation may be performed on the preliminary ratings dataset 150. A psychometric evaluation assesses how well an outcomes measure actually measures what it is intended to measure. A psychometric evaluation may include a combination of classical test theory analysis, factor analysis, and item response theory, and assesses the preliminary ratings dataset 150 for various aspects, which may include reliability, validity, responsiveness, dimensionality, item/test information, differential item functioning, and equating (score crosswalk). In one embodiment, classical test theory analysis may be employed to review the reliability of the items in the preliminary outcomes measure 100, and how the preliminary outcomes measure 100 and the domain work together.

Item Reduction. The item reduction step 152 assists in reducing the items from the preliminary outcomes measure 100 that do not work as anticipated. Factors can include reliability, validity, and responsiveness (also known as sensitivity to change). The purpose of the item reduction step 152 is to eliminate potential item content redundancy from items in the preliminary outcomes measure 100 to a minimal subset of items in an IRT outcomes measure 180 without sacrificing the psychometric properties of the data set. The item reduction step 152 may be performed using a computer or other computing device, for instance, using a computer program 125. The computer program 125 may be written in the R programming language or another appropriate programming language. The computer program 125 provides an option to allow the number of desired items (as well as options to include specific items) to be specified and computes the Cronbach's coefficient α reliability estimate for every possible combination of items within those user-defined constraints. Acceptable ranges of Cronbach's coefficient α may also be defined in the computer program 125. Additionally, the computer program 125 may construct and run syntax for a statistical modeling program, such as Mplus (Muthén & Muthén, Los Angeles, Calif., http://www.statmodel.com) to determine the fit of a 1-factor confirmatory factor analysis (CFA) model to each reduced subset 155 of items.

The computer program 125 may be used to analyze several of the outcomes measures included in the preliminary outcomes measure 100 (such as the FIST, BBS, FGA, ARAT, and MASA), and searched for unidimensional subsets between four and eight items with Cronbach's a reliabilities between 0.70 and 0.95. Using these constraints, the number of items in many measures may be reduced substantially. For instance, measures may be reduced by at least half of their original length while maintaining good psychometric properties. The resulting item subsets served as building blocks for the confirmatory factor analysis (CFA). In an embodiment, certain items may not be included in the item reduction process, such as the items from the FIM®. In an embodiment, the item reduction step 152 may performed multiple times. For instance, it may be performed on each outcomes measure included in the preliminary outcomes measure 100.

In the item reduction step 152, the computer program 125 determines the extent to which items are related to each other. The computer program 125 may determine the extent to which items within an outcomes measure in the preliminary outcomes measure 100 are related to each other. In one embodiment, items are related to each other within an outcomes measure if they have responses which correlate highly. The analysis may start by providing an initial core set of items, the number of which may be determined with clinician input, based on correlations between item pairs. For example, the computer program 125 may determine how item A relates to item B, where item A and item B are both in the same outcomes measure. If there is a high correlation, both item A and item B are included in the core set. Then, the computer program 125 may determine how new item C correlates to the set of items {A, B}. If there is a high correlation, item C is included in the core set. The method may be repeated with additional items, D, E, F, etc. As described above, the program assess the reliability (Cronbach's α) of every possible subset of items. The program correlates the responses from one set of items with the responses from a second set of items. Chronbach's α is known in the art but a brief example is hereby provided. The information used in computing Cronbach's α are the correlations between every possible split-half in the subset of items. For example, using 3 items {A, B, C}, Cronbach's α averages the correlations A vs. BC, B vs. AC, and C vs. AB. In other words, the correlations are computed between every pair of unique subsets of a set. The purpose of the correlational analysis is to help ensure that items are measuring the same underlying construct and improving reliability.

Table 1 lists an exemplary output of the item reduction step 152 for the Berg Balance Scale (“BBS”) outcomes measure, setting the sample size equal to five items. The numbers in each cell in the “item” columns reflect the number of the question on the BBS (1: sitting unsupported; 2: change of position—sitting to standing; 3: change of position—standing to sitting; 4: transfers; 5: standing unsupported; 6: standing with eyes closed; 7: standing with feet together; 8: tandem standing; 9: standing on one leg; 10: turning trunk (feet fixed)). Each reduced subset 155 is shown along with its associated Cronbach's α value. The first reduced subset has the highest Cronbach's α of the reduced subsets in Table 1. In one embodiment, the reduced subset with the highest Cronbach's α is used as the initial reduced subset for the CFA step 160, which is described below in further detail.

TABLE 1 Item Item Item Item Item Cronbach's Grouping 1 2 3 4 5 α 1 1 2 4 5 6 0.9594018 2 1 2 4 6 7 0.9592235 3 1 2 4 6 8 0.9584257 4 1 2 4 6 10 0.9583003 5 2 6 7 8 10 0.9552663 6 1 2 4 6 9 0.9550176 7 2 4 6 8 10 0.9549165 8 2 4 6 7 10 0.9547042 9 2 4 6 7 8 0.9539102 10 1 2 6 8 10 0.9538519

Confirmatory Factor Analysis. Factor analysis is a statistical method that is used to determine the number of underlying dimensions contained in a set of observed variables and to identify the subset of variables that corresponds to each of the underlying dimensions. The underlying dimensions can be referred to as continuous latent variables or factors. The observed variables (also known as items) are referred to as indicators. Confirmatory factor analysis (CFA) can be used in situations where the dimensionality of a set of variables for a given population is already known because of previous research. CFA may be used to investigate whether the established dimensionality and factor-loading pattern fits a new sample from the same population. This is the “confirmatory” aspect of the analysis. CFA may also be used to investigate whether the established dimensionality and factor-loading pattern fits a sample from a new population. In addition, the factor model can be used to study the characteristics of individuals by examining factor variances and covariances/correlations. Factor variances show the degree of heterogeneity of a factor. Factor correlations show the strength of association between factors.

Confirmatory factor analysis (CFA) may be performed using Mplus or other statistical software to validate how well the item composition within the pre-specified factor structure holds statistically. CFA is characterized by restrictions on factor loadings, factor variances, and factor covariances/correlations. CFA requires at least m{circumflex over ( )}2 restrictions where m is the number of factors. CFA can include correlated residuals that can be useful for representing the influence of minor factors on the variables. A set of background variables can be included as part of a CFA.

Mplus can estimate CFA models and CFA models with background variables for a single or multiple groups. Factor indicators for CFA models can be continuous, censored, binary, ordered categorical (ordinal), counts, or combinations of these variable types. When factor indicators are all continuous, Mplus has seven estimator choices: maximum likelihood (ML), maximum likelihood with robust standard errors and chi-square (MLR, MLF, MLM, MLMV), generalized least squares (GLS), and weighted least squares (WLS) also referred to as ADF. When at least one factor indicator is binary or ordered categorical, Mplus has seven estimator choices: weighted least squares (WLS), robust weighted least squares (WLSM, WLSMV), maximum likelihood (ML), maximum likelihood with robust standard errors and chi-square (MLR, MLF), and unweighted least squares (ULS). When at least one factor indicator is censored, unordered categorical, or a count, Mplus has six estimator choices: weighted least squares (WLS), robust weighted least squares (WLSM, WLSMV), maximum likelihood (ML), and maximum likelihood with robust standard errors and chi-square (MLR, MLF).

Using the highly-reliable subsets of items from the measure reduction step, a model may be defined in statistical software such as Mplus that hypothesizes that all items within a domain are interrelated. The model also may measure specific constructs under the preview of that domain. For example, all item subsets taken from Self Care measures may be hypothesized to measure Self Care, but also simultaneously measure one of Balance, Upper Extremity Function, and Swallowing. Constructing the model in this way allows for the measurement of both an overall domain (e.g., Self Care) as well as a set of interrelated constructs that compose that domain (e.g., Balance, UE Function, and Swallowing—the constructs composing Self Care). Given the data, the structure of the model implies a set of expected correlations between each pair of items. However, these (polychoric) correlations can be computed directly from the data. These are the observed correlations. The appropriateness of the constructed model, called “model fit” in statistics, may be determined using the root mean square error of approximation (RMSEA), which is a measure of the difference between the observed and expected correlations. In a preferred embodiment, if the value of that difference is low (for instance, less than 0.08) the model has acceptable fit.

After applying the CFA step 160 on a reduced subset 155, the output of the CFA step 160 may contain factor loadings, including a General Factor loading. The General Factor loading may be between −1 and 1, with values of the General Factor loading of between 0.2-0.7 indicating whether a factor assesses the relevant item well. The output of the CFA step 160 may provide additional factor loadings for each item. In an embodiment, each item may have a factor loading for each sub-domain. For instance, each item may have a factor loading value for Balance, a factor loading value for Upper Extremity, a factor loading value for Swallowing, and a factor loading value for each other sub-domain. Where the item is relevant to a sub-domain, the factor loading value will be non-zero, in an embodiment.

In certain instances, applying the CFA step 160 on a reduced subset 155 can create problems that require selection of a new reduced subset 155. For instance, a general factor loading value higher than 0.7, or particularly a value closer to 1.0, indicates redundancy. For instance, the way items are scored on the Action Research Arms Test (ARAT) outcomes measure necessarily forces too high of a reliability. Patients who achieve a maximum score on the first (most difficult) item are credited with having scored 3 on all subsequent items on that scale. If the patient scores less than 3 on the first item, then the second item is assessed. This is the easiest item, and if patients score 0 then they are unlikely to achieve a score above 0 for the remainder of the items and are credited with a zero for the other items. This method of scoring forces the too-high reliability. In other instances, if a factor loading value is greater than 1, it reflects that a pair of items has a negative variance (which is not possible) and so the CFA step 160 must be run on a new reduced subset 155. A new reduced subset 155 may be selected from the group of reduced subsets generated by item reduction step 152. For instance, new reduced subset may be selected that has the next-highest Chronbach's α, then applying CFA step 160 to the new reduced subset.

Additionally, during the process of running the CFA step 160, it may be apparent that items designated by clinicians as falling within one sub-domain should be moved to a different sub-domain in order to improve the fit of the model used to generate the IRT outcomes measure 180 (discussed further below). For example, during the development of the embodiments described herein, items identified by clinicians as relating to “Strength” were initially placed in the Self-Care domain. In running the CFA step 160, however, it was determined that these items did not fit the model. Moving these items to the “Upper Extremity Function” sub-domain improved the fit of the model.

Table 2 below shows the fit statistics of a 1-factor CFA containing groupings 1-10 set out in Table 1. In the CFA step 160, an assessment may be conducted as to whether the fit statistics listed in Table B meet usual “good fit” criteria. In one embodiment, these criteria are RMSEA <0.08, CFI >0.95, TLI >0.95, and WRMR <1.00. Those of ordinary skill in the art will appreciate that other good fit criteria could be used.

TABLE 2 Grouping fac rmsea cfi tli wrmr Meets Criteria? 1 1 0.145 0.999 0.997 0.016 No 2 1 0.155 0.998 0.997 0.017 No 3 1 0.143 0.999 0.997 0.015 No 4 1 0.148 0.998 0.997 0.017 No 5 1 0.09 0.999 0.999 0.009 No 6 1 0.126 0.999 0.998 0.014 No 7 1 0.076 1 0.999 0.008 Yes 8 1 0 1 1 0.002 Yes 9 1 0.028 1 1 0.004 Yes 10 1 0.09 0.999 0.999 0.009 No

Although the example above is given only with respect to one outcomes measure, the Berg Balance Scale, it should be understood that the CFA step 160 is applied to each outcomes measure in the preliminary outcomes measure 100.

Item Response Theory. In an embodiment, the IRT outcomes measure 180 may be structured to contain a plurality of high-level domains. For example, the IRT outcomes measure 180 may be structured to include a “Self Care” domain (which includes items determined to reflect a patient's capability to perform self care), a “Mobility” domain (which includes items determined to reflect a patient's capability to be mobile), and a “Cognition” domain (which includes items determined to reflect a patient's cognitive capabilities). Within each higher-level domain, specific assessment areas, also referred to as “factors” or “clusters,” may be identified. Table 3 reflects exemplary assessment areas associated with each higher-level domain.

TABLE 3 Domain Areas/Clusters Self-Care Balance UE Function Strength Changing Body Position Swallowing Mobility Balance W/C Skills Changing Body Positions Bed Mobility Mobility Cognition Awareness Agitation Memory Speech Communication

Because the measurement goals of the IRT outcomes measure 180 involved measuring general domains (i.e., Self Care, Mobility, and Cognition) as well as specific assessment areas within those domains, a bifactor structure for each of the domains may be targeted (the general factor and domain-specific factor). The composition of the specific factors may be determined by the content of each item set. For example, items from the FIST, BBS, and FGA may be combined to form the “Balance” assessment area within the Self Care domain. Acceptable fit of the bifactor model to the data was assessed using the criterion of RMSEA<0.08 (Browne & Cudeck, 1992), and modification indices were also computed to check for local item dependence and potential improvements to the model, such as additional cross-loadings (in other words, an item contributes to several factors).

Item Response Theory reflects a mathematical model that describes the relationship between a person's ability and item characteristics (such as the difficulty). For example, a more able person is more likely to be able to perform a harder task, and can allow a more tailored intervention based on a series of questions. Other item characteristics may be relevant as well, such as an item's “discrimination,” which is its ability to distinguish between people with high or low levels of a trait.

After constructing the CFA models for each of the domains, the final structures may be coded to run in an item response theory software package, such as flexMIRT (Vector Psychometric Group, Chapel Hill, N.C., US). flexMIRT is a multilevel, multidimensional, and multiple group item response theory (IRT) software package for item analysis and test scoring. The multidimensional graded response model (M-GRM) may be chosen to account for the ordered, categorical nature of the item responses from the clinician-rated performance ratings. For example, the dimensions may be “Self-Care,” “Mobility,” and “Cognition”. Sub-domains for “Self-Care” may be “Balance, “Upper Extremity Function,” “Strength,” “Changing Body Position”, and “Swallowing.” Sub-domains for “Mobility” may be Balance, Wheelchair (“W/C”) Skills, Changing Body Positions, Bed Mobility, and Mobility. Sub-domains for “Cognition” may be “Awareness,” “Agitation,” “Memory,” “Speech,” and “Communication.”

In a preferred embodiment, however, sub-domains may be reduced in order to focus on key subdomains of ability. For “Self-Care”, for example, these may be Balance, UE Function, and Swallowing. For “Cognition” these may be Cognition, Memory, and Communication. For “Mobility,” there may be no sub-domains—in other words, the sub-domains may all be clustered together.

The analysis also may be multigroup in nature. For example, the Self Care and Mobility samples may be split into groups determined by the level of balance (sitting, standing, or walking). As another example, the Cognition sample may be split into broad diagnostic categories (stroke, brain injury, neurological, or not relevant). In an embodiment, in order to accommodate the complexity of the models, the Metropolis-Hastings Robbins-Monro (MH-RM) algorithm (Cai, 2010) may be used for more efficient parameter estimation. MH-RM cycles through the following three steps repeatedly until the differences between two consecutive cycles are smaller than a chosen criterion. In Step 1 (Imputation), random samples of the latent traits are imputed from a distribution implied by the item parameter estimates taken from the preceding cycle. If it is the first cycle, then the distribution implied by the algorithm's starting values are used. This imputation can be performed using the MH sampler. In Step 2 (Approximation), the log-likelihood of the imputed data is evaluated. In Step 3 (Robbins-Monro Update), new parameter estimates for the next cycle are computed by using the Robbins-Monro filter on the log-likelihood in step 2. Step 1 is then repeated using the information from Step 3. Slopes can reflect item discriminations and intercepts can reflect item difficulties.

In addition to the item slopes and intercepts, maximum a posteriori (MAP) latent trait scores, which reflect the patient's level of ability, may be computed for each patient.

The principal coding for IRT focuses on translating the mathematical structure chosen after CFA into one that can be assessed using IRT. The data used for the analysis may be, for instance, simply the patients' ratings on all items on which they were assessed. For consistency, the most recent available data for each patient on each item they were administered may be used. This has the convenience of putting patient scores in a particular frame of reference: typical discharge level. MAP (maximum a posteriori) scoring may be used, but other scoring methods are known which could be employed instead, such as ML (maximum likelihood), EAP (expected a posteriori), or MI (multiple imputation). Additionally there are different estimation methods that could be employed. For instance, marginal maximum likelihood using the expectation-maximization algorithm (MML-EM) may be used. However, this method can suffer when working with more than a few dimensions. In a preferred embodiment, the Metropolis-Hastings Robbins-Munro (MH-RM) estimation is used.

Maximum a posteriori (MAP) scoring requires two inputs: the scoring density of the population (usually assumed to be standard normal for each dimension) and the IRT parameters for each item that a patient was rated on. Multiplying the population density by the IRT functions for each item results in what is known as a likelihood—in other words, mathematical representation of the probability of various scores, given what is known about the items and how the patient was rated on each of the items. The location of the maximum value of that function is the patient's MAP score.

Sometimes, response options on an item are only selected very rarely, which may cause problems with estimating IRT parameters for that item (and also implies that that response option may have been unnecessary). In such cases, those responses can be collapsed into an adjacent category. For example, if an item has responses {1, 2, 3, 4} and response 2 is very rarely seen in the data, we may recode the data {1, 2, 2, 3}. It should be understood that the actual value of the number is unimportant in IRT analysis, and that instead, the ordinality matters.

Group composition: The IRT analyses used here can be multigroup in nature to allow for more targeted assessment. For Self Care and Mobility, patients may be grouped according to their level of balance (none, sitting, standing, and walking). Similarly, groups may be formed in the cognitive domain according to their cognitive diagnosis (stroke, brain injury, neurological, or none). This method can result in multiple test forms that only contain items appropriate for each patient. For instance, they may contain test forms as follows: for “Self-Care” and “Mobility”, no balance, sitting balance, (up to) standing balance, and no balance restrictions; for “Cognition,” stroke, brain injury, neurological, or not disordered. The forms may tailored according to group membership, rather than to assessment areas. For example, the patient's balance level may affect which balance measure items appear on the Self Care and Mobility domains, while the patient's cognitive diagnosis (if any) may affect which measures may appear on the form. For instance, the ABS is only used on the Brain Injury form of the Cognition measure and the KFNAP is only used on the Stroke measure.)

Item response theory results in a distinct score for each domain. For example, a patient may score a 1.2 in the “Self-Care” domain, a 1.4 in the “Mobility” domain, and a 3 in the “Cognition” domain. In an embodiment, these scores may be reported to clinicians, patients, and others separately. In other embodiments, these scores may be combined into a single score. In an embodiment, a score of +1 means the patient is 1 logit above average. A score of −1 means the patient is 1 logit below average. Values below −3 and above 3 are highly improbable, because the mathematical assumptions underlying IRT are such that scores follow an average distribution. It should be recognized by one or ordinary skill in the art that other numbers reflecting the standard deviation and logit could be employed instead. For instance, a score of 3 could mean the patient is average, so scores would range between 0 and 6. As another example, a score of 50 could mean the patient is average, and a score of +10 could mean that the patient is 1 logit above average, so scores would range from 20 to 80.

An example of running the IRT step 170 is now provided, with respect to the Self-Care domain. Seven factors are provided to the IRT step 170: the Self Care factor, the Balance factor, the UE Functioning factor, the Swallowing factor, a hidden factor for ARAT, a hidden factor for overcoming a negative correlation between the FIST and FGA outcomes measures, and a hidden factor specific to the FIST so it is not overweighed in the result. The IRT step 170 (for instance, using the MH-RM estimation) returns a discrimination matrix 172 and a difficulty matrix 174. For instance, these matrices may be presented in slope/intercept formulation, where slope reflects item discrimination and intercept reflects item difficulty.

Table 4 displays an exemplary discrimination matrix 172 for the Self-Care domain of an exemplary IRT outcomes measure 180. The column headings a1-a7 in Table 4 represent the following, with “hidden” factors listed in parentheses: (a1: Self Care; a2: (ARAT local dependence); a3: Upper Extremity Function; a4: Swallowing; a5: Balance; a6: (Reduction of FIST influence); a7: (Negative relationship of BBS and FGA)). Table 4 lists the slope values for each item for each factor a1-a7. The item naming in Table 4 is also reflected in Table 6 in Appendix 1, listing the items in an exemplary IRT outcomes measure 180.

TABLE 4 Item a1 (a2) a3 a4 a5 (a6) (a7) 6 anteriorNudge 2.36 0 0 0 2.71 5.86 0 7 staticSitting 2.35 0 0 0 2.8 6.07 0 8 sittingEyesClosed 2.15 0 0 0 2.55 5.37 0 9 sittingLiftFoot 1.99 0 0 0 2.23 4.86 0 10 lateralReach 2.25 0 0 0 2.5 5.45 0 11 pickUpFromFloor 1.78 0 0 0 1.92 4.3 0 1 standingToSitting 0.7 0 0 0 4.05 0 2.44 2 stUnFtTgthr 0.89 0 0 0 5.65 0 3.4 3 rFwdArm 0.75 0 0 0 4.38 0 2.93 4 pickObjSt 0.72 0 0 0 4.24 0 2.79 5 turn360 0.92 0 0 0 5.25 0 3.48 12 levelSurface 1.01 0 0 0 1.81 0 −2.08 13 verticalHeadTurns 0.95 0 0 0 1.86 0 −1.93 14 pivotTurn 1.17 0 0 0 2.15 0 −2.33 15 stepOverObstacle 1.03 0 0 0 2.04 0 −2.19 16 ambulatingBackwards 1.05 0 0 0 2.02 0 −2.17 17 graspWood2p5 0.77 4.62 4.57 0 0 0 0 18 gripPourWater 0.87 5.11 5.61 0 0 0 0 19 pinchBearing3rd 0.67 3.38 4.6 0 0 0 0 20 grossHandToMouth 0.48 3.24 3.21 0 0 0 0 nhpLpoly 0 0 2.32 0 0 0 0 nhpRpoly 0 0 2.21 0 0 0 0 bbLpoly 0 0 3.63 0 0 0 0 bbRpoly 0 0 3.4 0 0 0 0 24 salivaC 1.84 0 0 2.99 0 0 0 25 tongueMovementC 3.42 0 0 5.97 0 0 0 26 tongueStrengthC 3.26 0 0 5.6 0 0 0 27 tongueCoordinationC 2.79 0 0 4.8 0 0 0 28 oralPreparationC 2.31 0 0 4.04 0 0 0 29 bolusClearanceC 2.57 0 0 4.27 0 0 0 30 oralTransitC 2.46 0 0 4.22 0 0 0 31 voluntaryCoughC 1.59 0 0 2.74 0 0 0 32 pharyngealPhaseC 2.11 0 0 3.67 0 0 0 33 pharyngealResponseC 1.29 0 0 2.2 0 0 0 foisC 1 0 0 1.62 0 0 0 ricdssC 1.32 0 0 1.88 0 0 0 eatingC 2.37 0 0 0 0 0 0 groomingC 3.49 0 0 0 0 0 0 bathingC 3.9 0 0 0 0 0 0 dressingUpperC 4.87 0 0 0 0 0 0 dressingLowerC 5.63 0 0 0 0 0 0 toiletingC 5.08 0 0 0 0 0 0

Table 5 displays an exemplary difficulty matrix 174 for an IRT outcomes measure 180. Table 5 displays the intercept values for each item, for each factor d1-d6. The column headings d1-d6 in Table 5 represent the following, with “hidden” factors listed in parentheses: (d1: Self Care; d2: (ARAT local dependence); d3: Upper Extremity Function; d4: Swallowing; d5: Balance; d6: (Reduction of FIST influence)).

TABLE 5 Item d1 (d2) d3 d4 d5 (d6) anteriorNudge 9.88 8.52 6.16 5.29 staticSitting 13.1 9.96 7.88 6.48 sittingEyesClosed 9.4 7.99 6.39 5.08 sittingLiftFoot 7.12 5.88 4.37 3.28 lateralReach 7.92 5.35 3.48 2.15 pickUpFromFloor 4.82 3.74 2.69 1.71 standingToSitting 2.07 0.65 −0.24 −3.13 stUnFtTgthr 0.04 −1.88 −2.36 −6 rFwdArm 0.19 −2.63 −3.3 −5.7 pickObjSt −1.36 −1.87 −2.01 −5.63 tum360 −2 −4.96 −7.25 −7.79 levelSurface 4.18 0.41 −2.28 verticalHeadTurns 3.64 1.1 −2.5 pivotTurn 3.92 0.83 −1.89 stepOverObstacle 3.22 −0.87 −2.93 ambulatingBackwards 3.07 0.29 −3.45 graspWood2p5 5.25 3.4 −0.19 gripPourWater 4.22 2.54 −2.2 pinchBearing3rd 0.66 −2.66 grossHandToMouth 5.68 4.1 1.24 nhpLpoly 1.57 0.14 −1.17 nhpRpoly 1.75 0.44 −0.81 bbLpoly 1.74 0.08 −1.92 bbRpoly 1.95 0.1 −1.72 salivaC 3.78 2.76 1.01 0.35 tongueMovementC 5.26 2.22 −0.26 −3.66 tongueStrengthC 0.39 −1.37 −4.94 tongueCoordinationC 3.11 −0.08 −3.2 oralPreparationC 1.23 0.76 0.11 −2.09 bolusClearanceC 1.42 −0.5 −4.08 oralTransitC 2.95 1.53 −0.44 −3.33 voluntaryCoughC 1.15 −0.11 −1.56 pharyngealPhaseC 3.66 1.45 −3.12 pharyngealResponceC 2.3 −0.14 foisC 0.8 0.41 0.18 −0.02 −0.79 −1.05 ricdssC 1.68 0.58 −0.29 −1 −1.65 eatingC 2.37 1.9 1.55 −1.32 −3.26 groomingC 3.31 1.77 0.74 −3.09 −8.07 bathingC 1.72 0.63 −0.74 −2.35 −6.73 dressingUpperC 3.62 2.24 0.94 −0.62 −4.97 −11.37 dressingLowerC 1.67 −0.15 −1.58 −3.7 −7.37 −14.8 toiletingC 0.72 −0.48 −1.47 −2.92 −6.25 It should be understood that discrimination matrix 172 and a difficulty matrix 174 may be prepare for each domain in the IRT outcomes measure 180.

An exemplary score/probability response may be plotted, where the X-axis reflects the score and the Y-axis reflects the probability of response. The product of the curves results in a likelihood curve that somewhat appears like a bell curve. The peak of the curve can be used as the score for the patient.

Input from Therapists to Ensure Clinical Relevance. Each item may be labeled with a cluster that most appropriately describes its role in the IRT outcomes measure 180. This labeling may be done by clinicians on the basis of their education, training, and experience. For example, a clinician may label an item that measures balance, such as items that test function in sitting, as falling within the “Mobility” domain and the “Balance” factor in Table 1.

Because item selection (retention or removal) in the item reduction step of the analysis is predicated on psychometric and statistical evaluations, in an embodiment, clinical experts may review the item content covered in the reduced item sets for further feedback. For example, a pool of clinicians may be surveyed for input on whether items should be added or removed from the subsets taken from each of the full outcomes measures. Their input may be used to construct the final models for each domain, to help ensure the retained items are psychometrically sound and clinically relevant.

Remodeling to Derive Final Sets of Items. After the negotiated item sets, considering both psychometric evaluation and clinical judgement, were in place, the CFA and IRT steps may be carried out. Left-out items with a large clinical endorsement may be added back into the models, while included items with low endorsement may be removed. The fit of the models to the data may then be assessed using the root mean square error of approximation (RMSEA) computed during the CFA, and new item parameter estimates and latent trait scores were computed during the IRT analysis. Table 6 in Appendix 1 to this Specification lists the items in a preferred exemplary IRT outcomes measure 180.

Display

Various aspects of data relating to an individual patient's score may be displayed for a clinician and/or a patient.

FIG. 3 displays an exemplary scoring system of the IRT, comparing it against a FIM® score known in the prior art. The IRT score reflects the amount of ability that a person, such as a patient, has. The IRT score can be a continuously scaled score across all functional categories. A score of exactly 0 means that the person has average ability at discharge. A score above 0 means that the person has above average ability. A score below 0 means that the person has below average ability. FIG. 3 displays the FIM scoring for a dressing item on the FIM and the continuum of attainable Self Care IRT scores. The FIM scores are reflected by the length of each patterned section. For example, the patterned section labeled “1” reflects a FIM score of 1; the section labeled “2” reflects a FIM score of 2; and so on. Scores and difficulties are presented on the same metric, meaning that if someone has an IRT score of 1.50, they would be expected to score in the 6^(th) category of the item.

The value of the IRT score becomes apparent from an analysis of FIG. 3. Suppose a patient is admitted to an inpatient rehabilitation facility, and their IRT score improves from −1.00 to 0.00. The equivalent change in the FIM level would be a +3. As a result, this would be seen as a good outcome for the patient, as the patient showed functional gains.

However, the FIM score is deficient in showing gains when progress is made within a FIM level. Suppose another patient is admitted with a score of −2.00, and progresses all the way to −1.00. Even though the patient made just as much progress as the previous patient (+1.00), it still looks like the patient has not improved her functional level on upper body dressing, as the change in the FIM for this item is 0. As a result, one benefit of the IRT score is that it can detect improvement where the FIM cannot. In our experience, the expected change in Self Care for individuals with Nontraumatic spinal cord injury and Neurological injuries is fairly dramatic when using the IRT.

FIG. 4 displays an exemplary plot of certain data relating to patient scores in the Self Care, Cognition, and Mobility domains. The percentage values 25%, 50%, 75%, and 100% reflect the percent of a score on each domain. For instance a score of 100% on the Self Care domain reflects a patient who received the highest possible score on that domain. The solid line triangle reflects the patient's initial scores, which may be tabulated based on assessments at or shortly after admission. The black triangle reflects the patient's current scores. The dashed-lines triangle reflects the patient's predicted scores. By reviewing the scores in this way, a clinician is able to easily determine the domains in which a patient has made improvement and also easily determine the domains where additional therapy or other care may be useful. For instance, after reviewing the plot at FIG. 4, a clinician may determine that further care should be focused in the area of Self Care and Mobility, since those scores are below the predicted scores for those domains.

Prediction estimates may be derived in a variety of ways. In one embodiment, Hierarchical Linear Modeling (HLM) may be used, incorporating information regarding past patients' diagnoses, the severity of those diagnoses (the “case mix group”, a measure of the patient's condition's severity within a diagnosis), the days on which measures were administered to the patients, and the scores on those days. The modeling may output a predictive curve for every severity within every diagnosis for up to 50 days of inpatient stay. When plotting the information, the x-axis may be the number of days since admission and the y-axis may be the IRT (MAP) score.

Other methods of prediction could be used, including data science methods like neural networks and random forest models. Furthermore, additional patient information may be incorporated in the prediction process.

In an embodiment, a patient may be assessed using the IRT outcomes measure 180 over multiple days. For instance, the patient may be assessed on a first subset of questions from the IRT outcomes measure 180 on a first day, and then assessed on a second subset of questions on a second day. The data feed may be set up such that it collects the most recent item value.

Adaptive testing may be employed, such that the items in the IRT outcomes measure 180 are selected for assessment in response to the score from an already assessed item. For example, the clinician may assess the patient with the items in the IRT outcomes measure 180 from the FIST test; compute an initial IRT score based off the results; and then select a next item (or a plurality of next items) most appropriate, based on the initial IRT score. This process may be applied iteratively until the patient's score can be determined to be accurate within a pre-determined uncertainty level. For instance, once uncertainty is at or below 0.3, the adaptive testing method may stop providing additional items for assessment and provide a final IRT score for the patient, clinician, or others to review.

FIG. 5 displays an exemplary chart of certain data relating to patient scores in the Self Care domain. Each row of the chart relates to a single item. For example, the top row relates to the test of grasping a wood block. Each row of the chart is divided into different shadings, as is discussed with respect to FIG. 3. The length of each section reflects how a score on that item relates to the AQ score. For example, section b1 reflects how a score of 1 on a grasping item relates to the AQ score.

FIG. 5 further shows a patient's current and expected functional status on each of the items/tasks in the Self Care domain. It should be understood that this chart could display data from Mobility, Cognition, or other domains. The “Choose Length of Stay” scroll-bar allows a clinician to compare each level of ability on every item (e.g., current vs. expected) for various lengths of stay. This can allow a clinician to determine whether additional days of inpatient stay would likely benefit a patient, and if so, by how much.

A clinician may review the IRT score with the patient's score on a particular FIM item to determine whether additional interventions are appropriate. For example, if the patient has AQ score of 1, a score of 4 on the FIM toileting measure is expected. But, if the FIM toileting measure is lower, the clinician can use that as an indication to adjust therapy to specifically target improved toileting.

FIG. 6A and FIG. 6B are a “FIM Explorer” section, with features to allow clinicians to select and/or set the goal for each FIM-specific task. For instance, ‘4—Minimal Assistance’ is chosen in FIG. 6A as a treatment goal for eating task. Once a task-specific goal is chosen, a comparison chart shown at FIG. 7 may be displayed. This chart can allow a therapist or other clinician to compare whether the goals are set too high or too low when compared with the vertical line on the chart. The vertical line is derived from the selected goal ratings and converted into an IRT score.

FIG. 8 displays various plots for the Self Care domain in comparison with a patient's FIM score. As shown in FIG. 8, the assessment areas within the Self Care domain include Balance, Upper Extremity Function, and Swallowing. In one embodiment, incomplete FIM administration may be omitted from the plot, to avoid confusion about whether the score is low or merely incomplete.

Prediction

Prediction of AQ score may be based on various factors, such as medical service group; case mix group (CMG); and/or lengths of stay. Within CMG, age may be a factor used to assist in the prediction.

Data generated by predictive models may be used in various ways. For example, a patient's length of stay can be predicted via his/her medical condition, level of impairment, and other demographic and clinical characteristics. As another example, if a patient is below their prediction on a given domain, clinicians can target those areas for more focused therapies. As another example, if a patient's progress in one domain has begun to taper off, clinicians could note this and prioritize balanced treatment in that domain. As another example, given some financial information, it would be possible to assess the dollar value of expected improvement over a period of time and compare it to the cost of inpatient care over that same time frame. Discharge decisions could be made using the ratio of value of care to cost of care. Additionally, predicting success in other treatment settings is possible. Given similar assessments in other levels and locations of care (e.g., outpatient, SNF, etc.), a prospective look at the course of improvement in those settings could be determined. Better decisions regarding care in those settings could potentially be made.

APPENDIX 1 No. Item Instructions Responses Domain and Area/Cluster 1 Standing to Please sit down. 4) Sits safely with minimal use of hands Domain: Self Care Sitting 3) Controls descent by using hands Area/Cluster: Balance 2) Uses back of legs against chair to control descent Domain: Mobility 1) Sits independently but has uncontrolled descent Area/Cluster: Balance 0) Needs assistance to sit 2 Standing Place your feet together and stand 4) Able to place feet together independently and stand for Domain: Self Care Unsupported without holding. 1 minute safely Area/Cluster: Balance with Feet 3) Able to place feet together independently and stand for Domain: Mobility Together 1 min with supervision Area/Cluster: Balance 2) Able to place feet together independently and to hold for 30 seconds 1) Needs help to attain position but able to stand 15 seconds feet together 3 Reaching Lift arm to 90 degrees. Stretch out 4) can reach forward confidently > 25 cm (10 inches) Domain: Self Care Forward with your fingers and reach forward as far 3) can reach forward > 12.5 cm safely (5 inches) Area/Cluster: Balance Outstretched as you can. (Examiner places a ruler 2) can reach forward > 5 cm safely (2 inches) Domain: Mobility Arm while at end of fingertips when arm is at 90 1) reaches forward but needs supervision Area/Cluster: Balance Standing degrees. Fingers should not touch the 0) loses balance while trying/requires external support ruler while reaching forward. The recorded measure is the distance forward that the finger can reach while the subject is in the most forward lean position. When possible, ask subject to use both arms when reaching to avoid rotation of the trunk.) 4 Pick Up Pick up the shoe/slipper which is 4) able to pick up object safely and easily Domain: Self Care Object From placed in front of your feet. 3) able to pick up object but needs supervision Area/Cluster: Balance the Floor May use an empty tissue box for 2) unable to pick up object but reaches 1-2 in. from object, Domain: Mobility From a testing instead of the shoe/slipper. keeps balance independently Area/Cluster: Balance Standing 1) unable to pick up and needs supervision while trying Position 0) unable to try/needs assist to keep from losing balance or falling 5 Turn 360 Turn completely around in a full 4) able to turn 360 degrees safely in 4 seconds or less Domain: Self Care Degrees circle. Pause. Then turn a full circle 3) able to turn 360 degrees safely one side only in 4 Area/Cluster: Balance in the other direction. seconds or less Domain: Mobility 2) able to turn 360 degrees safely but slowly Area/Cluster: Balance 1) needs close supervision or verbal cueing 0) needs assistance while turning 6 Anterior Light anterior nudge to superior 4) Independent (completes task independently & Domain: Self Care Nudge sternum successfully) Area/Cluster: Balance 3) Verbal cues/increased time (completes task Domain: Mobility independently & successfully and only needs more Area/Cluster: Balance 7 Static Have patient sit for 30 seconds time/cues) Domain: Self Care Sitting 2) Upper extremity support (must use UE for support or Area/Cluster: Balance assistance to complete successfully) Domain: Mobility 1) Needs assistance (unable to complete without physical Area/Cluster: Balance 8 Sitting, Eyes Have patient sit with eyes closed for assist) Domain: Self Care Closed 30 seconds 0) Dependent (requires complete physical assist, unable to Area/Cluster: Balance complete successfully even with physical assist) Domain: Mobility Area/Cluster: Balance 9 Sitting, Left Have patient sit, dominant side, lift Domain: Self Care Foot foot 1 inch twice Area/Cluster: Balance Domain: Mobility Area/Cluster: Balance 10 Lateral Reach Have patient use dominant arm, clear Domain: Self Care opposite ischial tuperosity Area/Cluster: Balance Domain: Mobility Area/Cluster: Balance 11 Pick Up Have patient pick up object from Domain: Self Care Object from floor, from between feet Area/Cluster: Balance Floor Domain: Mobility Area/Cluster: Balance 12 Gait, Level Walk at your normal speed from here 3) Normal: Walks 6 m (20 ft) in less than 5.5 seconds, no Domain: Self Care Surface to the next mark (6 m [20 ft]). assistive devices, good speed, no evidence for imbalance, Area/Cluster: Balance normal gait pattern, deviates no more than 15.24 cm (6 in) Domain: Mobility outside of the 30.48-cm (12-in) walkway width. Area/Cluster: Balance 2) Mild impairment: Walks 6 m (20 ft) in less than 7 seconds but greater than 5.5 seconds, uses assistive device, slower speed, mild gait deviations, or deviates 15.24-25.4 cm (6-10 in) outside of the 30.48-cm (12-in) walkway width. 1) Moderate impairment-Walks 6 m (20 ft), slow speed, abnormal gait pattern, evidence for imbalance, or deviates 25.4-38.1 cm (10-15 in) outside of the 30.48-cm (12-in) walkway width. Requires more than 7 seconds to ambulate 6 m (20 ft). 0) Severe impairment-Cannot walk 6 m (20 ft) without assistance, severe gait deviations or imbalance, deviates greater than 38.1 cm (15 in) outside of the 30.48-cm (12-in) walkway width or reaches and touches the wall. 13 Gait with Walk from here to the next mark (6 m 3) Normal-Performs head turns with no change in gait. Domain: Self Care Vertical [20 ft]). Begin walking at your Deviates no more than 15.24 cm (6 in) outside 30.48-cm Area/Cluster: Balance Head Turns normal pace. Keep walking straight; (12-in) walkway width. Domain: Mobility after 3 steps, tip your head up and 2) Mild impairment-Performs task with slight change in Area/Cluster: Balance keep walking straight while looking gait velocity (eg, minor disruption to smooth gait path), up. After 3 more steps, tip your head deviates 15.24-25.4 cm (6-10 in) outside 30.48-cm down, keep walking straight while (12-in) walkway width or uses assistive device. looking down. Continue alternating 1) Moderate impairment-Performs task with moderate looking up and down every 3 steps change in gait velocity, slows down, deviates 25.4-38.1 until you have completed 2 cm (10-15 in) outside 30.48-cm (12-in) walkway width repetitions in each direction. but recovers, can continue to walk. 0) Severe impairment-Performs task with severe disruption of gait (eg, staggers 38.1 cm [15 in] outside 30.48-cm (12-in) walkway width, loses balance, stops, reaches for wall). 14 Gait and Instructions: Begin with walking at 3) Normal-Pivot turns safely within 3 seconds and stops Domain: Self Care Pivot Turn your normal pace. When I tell quickly with no loss of balance. Area/Cluster: Balance you,“turn and stop,” turn as quickly 2) Mild impairment-Pivot turns safely in >3 seconds and Domain: Mobility as you can to face the opposite stops with no loss of balance, or pivot turns safely within Area/Cluster: Balance direction and stop. 3 seconds and stops with mild imbalance, requires small steps to catch balance. 1) Moderate impairment-Turns slowly, requires verbal cueing, or requires several small steps to catch balance following turn and stop. 0) Severe impairment-Cannot turn safely, requires assistance to turn and stop. 15 Step Over Begin with walking at your normal 3) Normal-Is able to step over 2 stacked shoe boxes Domain: Self Care Obstacle pace. When I tell you,“turn and stop,” taped together (22.86 cm [9 in] total height) without Area/Cluster: Balance turn as quickly as you can to face the changing gait speed; no evidence of imbalance. Domain: Mobility opposite direction and stop. 2) Mild impairment-Is able to step over one shoe box Area/Cluster: Balance (11.43 cm [4.5 in] total height) without changing gait speed; no evidence of imbalance. 1) Moderate impairment-Is able to step over one shoe box (11.43 cm [4.5 in] total height) but must slow down and adjust steps to clear box safely. May require verbal cueing. 0) Severe impairment-Cannot perform without assistance. 16 Ambulating Walk backwards until I tell you to 3) Normal-Walks 6 m (20 ft), no assistive devices, good Domain: Self Care Backwards stop. speed, no evidence for imbalance, normal gait pattern, Area/Cluster: Balance deviates no more than 15.24 cm (6 in) outside 30.48-cm Domain: Mobility (12-in) walkway width. Area/Cluster: Balance 2) Mild impairment-Walks 6 m (20 ft), uses assistive device, slower speed, mild gait deviations, deviates 15.24-25.4 cm (6-10 in) outside 30.48-cm (12-in) walkway width. 1) Moderate impairment-Walks 6 m (20 ft), slow speed, abnormal gait pattern, evidence for imbalance, deviates 25.4-38.1 cm (10-15 in) outside 30.48-cm (12-in) walkway width. 0) Severe impairment-Cannot walk 6 m (20 ft) without assistance, severe gait deviations or imbalance, deviates greater than 38.1 cm (15 in) outside 30.48-cm (12-in) walkway width or will not attempt task. 17 Grasp Pickup 2.5 cm block 3 = task is completed in less than 5 seconds; appropriate Domain: Self Care body posture; normal hand movement components; Area/Cluster: Upper normal arm movement components Extremity Function 18 Grip Pour water from one glass to another 2 = task is completed but either with great difficulty or Domain: Self Care takes abnormally long. “Great difficulty” means abnormal Area/Cluster: Upper hand movement components (i.e. wrong grasp), abnormal Extremity Function 19 Pinch Grasp the ball bearing (or marble) arm movements (i.e. elbow does not flex as required), or Domain: Self Care using these fingers, lift it up, and abnormal body movements (i.e. trunk compensations), Area/Cluster: Upper place it in the tin on top of the shelf. “abnormally long” means between 5-60 seconds. Extremity Function Patient attempts to lift the (6 mm) 1 = Patient only partially completes the task within the 60 ball bearing (or marble) with 3^(rd) seconds, regardless of hand/arm movements patterns or finger and thumb postural requirements. For grasp, grip and pinch 20 Hand to Touch your mouth with the palm of subscales, score is not attainable without some form of Domain: Self Care mouth your hand. hand movement. Simply pushing the object across table Area/Cluster: Upper does not = 1. Extremity Function 0 = Given when the patient is unable to complete any part of the hand or arm movement within 60 seconds. 21 Dexterity I want to see how quickly you can Domain: Self Care pick up one block at a time with your Area/Cluster: Upper right (or left) hand. Carry it to the Extremity Function other side of the box and drop it. Make sure your fingertips cross the partition. 22 Peg Test “Pick up the pegs one at a time, using Domain: Self Care one hand only, and put them in the Area/Cluster: Upper holes as quickly as possible. You can Extremity Function do them in any order until all of the holes are filled. Then, without pausing, remove the pegs one at a time and return them to the container as quickly as you can. We will do this two times with each hand.” 23 Functional Domain: Self Care Oral Intake Area/Cluster: Scale Swallowing 24 Saliva Observe the patient's control of 5) NAD Domain: Self Care saliva. Note any escape of secretions 4) frothy/expectorated Area/Cluster: from the side of the mouth, and check 3) drooling at times Swallowing comers of mouth for wetness. Ask 2) some drool consistently the patient if he or she has noticed 1) gross drool undue saliva loss during the day, at night, or while side lying. 25 Tongue Anterior Aspect: 10) full range of motion (“ROM”) Domain: Self Care Movement Protrusion - have patient extend 8) mild impairment in range Area/Cluster: tongue as far forward as possible and 6) incomplete movement Swallowing then retract similarly. 4) minimal movement Lateralization - have patient touch 2) no movement each corner of the mouth, then repeat alternating lateral movements. With tongue, have patient attempt to clear out lateral sulci on each side of the mouth. Elevation - With mouth open wide, have patient raise tongue tip to alveolar ridge. Alternate elevation and depression in this way. Posterior Aspect: Elevation - have patient raise back of tongue to meet palate and hold the position 26 Tongue Have patient push laterally, against a 10) NAD Domain: Self Care Strength tongue depressor or gloved finger. 8) minimal weakness Area/Cluster: Have patient push anteriorly, against 5) unilateral weakness Swallowing a tongue depressor or gloved finger. 2) gross weakness Have patient push during elevation and depression of the tongue. Ask patient to elevate back of tongue against a tongue depressor or gloved finger. Note tone and strength to resistance. 27 Tongue Ask patient to lick around lips, 10) NAD Domain: Self Care Coordination slowly and then rapidly, touching all 8) mild incoordination Area/Cluster: parts. 5) gross incoordination Swallowing Have patient rapidly repeat tongue 2) no movement unable to assess ripe alveolar syllables /ta/. Repeat a sentence including tongue tip alveolar consonants e.g. Take Tim to tea). Ask patient to rapidly repeat velar syllables /ka/. Repeat a sentence including velar consonants e.g., Can you keep Katie clean?). 28 Oral Observe patient while eating or 10) NAD Domain: Self Care Preparation chewing. Ask to observe how bolus is 8) lip or tongue seal bolus escape Area/Cluster: prepared prior to swallowing. Check 6) minimal chew thrust gravity assisted Swallowing for loss from mouth, position of food 4) no bolus formation no attempt bolus, spread throughout oral cavity, 2) unable to examine and loss of material into lateral or anterior sulci. Note chewing movements and fatigue. 29 Bolus Observe patient eating/swallowing a 10) fully cleared Domain: Self Care Clearance bolus. 8) significant clearance/minimal residue Area/Cluster: Check oral cavity for residue 5) some clearance/residue Swallowing following a swallow. 2) no clearance 30 Oral The clinician will position a hand 10) NAD, triggers rapidly within 1 second Domain: Self Care Transit under the patient's chin, with fingers 8) delay > 1 sec Area/Cluster: spread as per manual palpation 6) delay > 5 sec Swallowing method Logemann, 1983). Use only a 4) delay > 4 sec light touch. Ask the patient to 2) no movement observed swallow. Compare time elapsed between the initiation of lingual movement until the initiation of hyoid and laryngeal rise. 31 Voluntary Ask the patient to cough as strongly 10) NAD, strong clear cough Domain: Self Care Cough as possible. Observe strength and 8) attempt bovine Area/Cluster: clarity of cough. 5) attempt inadequate Swallowing 2) no attempt/unable to assess 32 Pharyngeal Observe hyoid and laryngeal 10) immediate laryngeal elevation clearance of material Domain: Self Care Phase movement using manual palpation 8) laryngeal elevation mildly restricted slow initiation Area/Cluster: method Logemann, 1983). Note incomplete clearance Swallowing smoothness of excursion and 5) pooling/gurgling laryngeal elevation incomplete maximal elevation point. Following 2) no swallow unable to assess swallow, ask patient to phonate/ah/ for several seconds. Note vocal quality. Ask patient to pant following swallow then vocalize. Note vocal quality Ask patient to turn head to each side and vocalize. Note vocal quality. Ask patient to lift chin and vocalize. Note vocal quality. 33 Pharyngeal Observe vocal quality and coughing 10) NAD Domain: Self Care Response as a result of swallow. To be 5) cough before/during/after swallow Area/Cluster: completed in association with other 1) not coping/gurgling Swallowing assessment tasks. 34 RIC Dysphagia supervision instructions. 6) No cueing needed to safely tolerate diet. Independent. Domain: Self Care Dysphagia 5) Needs cues 10% of the time to safely tolerate diet. Area/Cluster: Supervision Standby prompting. Swallowing Scale 4) Needs cues 10-25% of the time to safely tolerate diet. Minimal cues. 3) Needs cues 25-50% of the time to safely tolerate diet. Moderate cues. 2) Needs cues 50-75% of the time to safely tolerate diet. Maximal cues. 1) Needs cues 75-100% of the time to safely tolerate diet. Total assistance. 0) For NPO patients and patients who are physically dependent for eating. 35 Pressure Pressure relief instructions Domain: Mobility Relief Area/Cluster: Wheelchair Skills 36 5 Time Sit Patient sits with arms folded across Record amount of time required to complete the test. Domain: Mobility to Stand chest and with their back against a Timing begins at “Go” and stops when the patient's Area/Cluster: Changing chair. For patients with history of buttocks touch the chair on the fifth repetition. Body Positions stroke, it is acceptable to have the Domain: Mobility impaired arm at their side or in a Area/Cluster: Mobility sling. Use a chair at height from 43-45 cm. Ensure that the chair is not secured (i.e against the wall or mat). Instructions: “I want you to stand up and sit down 5 times as quickly as you can when I say ‘Go’.” Instruct to stand fully between repetitions of the test and not to touch the back of the chair during each repetition. 37 Bed Mobility Domain: Mobility (Prone) Area/Cluster: Changing Body Positions Domain: Mobility Area/Cluster: Bed Mobility 38 Short Sit to Domain: Mobility Bed/Mat Area/Cluster: Bed (Supine) Mobility 39 Supine to Domain: Mobility long sit/ring Area/Cluster: Bed sit Mobility 40 Self ROM Domain: Mobility Area/Cluster: Bed Mobility 41 Timed Stair Domain: Mobility Climb Area/Cluster: Mobility 42 6 Minute Domain: Mobility Walk Area/Cluster: Mobility 43 Borgs RPE Domain: Mobility Area/Cluster: Mobility 44 10 Meter Domain: Mobility Walk Test Area/Cluster: Mobility 45 6 Minute Domain: Mobility Push Area/Cluster: Mobility 46 City Patient should name the city they are 3: Correct Spontaneous or upon first free recall attempt Domain: Cognition currently in 2: Correct upon logical cueing (i.e. that was yesterday, so Area/Cluster: Cognition 47 Name of Patient should name the hospital they today . . .) Domain: Cognition Hospital are currently at 1: Correct upon multiple choice or phonemic cueing Area/Cluster: Cognition 48 Month Patient should know the current 0: incorrect despite cueing, inappropriate response or Domain: Cognition month unable to respond. Area/Cluster: Cognition 49 Year Patient should know the current year Domain: Cognition Area/Cluster: Cognition 50 Clock Time Patient should know the time, may Domain: Cognition be +/−30 minutes. They can look at a Area/Cluster: Cognition clock without penalty 51 Etiology/Event Patient should know what brought Domain: Cognition them into the hospital (i.e. what Area/Cluster: Cognition brought you in today?) 52 Gaze No instructions to the patient. 3: The patient is easily able to direct his gaze toward the Domain: Cognition Orientation Instructions to clinician: observe right side of the space but does not attempt to orient the Area/Cluster: Cognition patient gaze. eyes toward the left side. 2: There are constant and clear asymmetries in the gaze direction toward the left and right sides of space. The patient explores the environment by looking toward the right first, and after a long delay, slowly looks toward the left. During the entire session, the patient spends much more time looking to his right side. 1: There are inconsistent but observable asymmetries in the gaze direction toward the left and right sides of space. The patient explores an environment by looking toward the right first, and then slowly toward the left with some hesitation. During the entire session the patient looks toward the right more than the left. 0: The patient spontaneously directs his/her gaze toward the right and left sides of space without hesitation and without any prompting. 53 Limb No instructions to patient. 3: The patient completely ignores the left limbs and never Domain: Cognition Awareness Instructions to clinician: observe attempts, with the assistance of the right hand, to move the Area/Cluster: Cognition patient use of limbs. left arm or and leg, or verbally acknowledge any discomfort in the left arm and leg. You cannot observe any spontaneous caring for the left limbs. 2: Time spent caring for the left limbs is short with incomplete performance. For example, during the entire session, they care for their left arm once, by moving it over to the arm rest, but for the remainder of the session they do not care much for it and let it accidentally hang outside the chair. Another example, is when asked to wash their hands, they do not wash their left hand or only wash it incidentally. Or you may think of the entire session in a continuous time period. If the patient takes care of their left limb only ⅓ of the time, give them a score of 2. 1: If the patient takes care of their left limbs ⅔ of the time, you give them a score of 1. 0: The patient pays attention and cares for their left limbs or as much as they do for their right limbs. Even if they complain of difficulty moving the left limbs or ask for help, because it means they pay attention to the left limbs. 54 Auditory No instructions to patient. 3: The patient shows an immediate reaction to the sound Domain: Cognition Attention Instructions to clinician: Make sure from the right side but no reaction from the left side at all. Area/Cluster: Cognition you are out of the patient's sightlines, 2: The patient shows immediate reaction to the sound from and then without warning, make a the right side, but the reaction from the left side is loud noise to the patients right or left inadequate or incorrect. For example, the patient may side. You can drop an object or clap state they heard something but is not able to identify the loudly. Do it once on the left side, location or the noise. They shift their head or body to the and once to the right side later. opposite side from which the noise is actually coming. Observe whether patient has an 1: The patient immediately reacts to the sound from the immediate reaction like startle, right correctly but takes an observably longer time or blinking, or wincing. hesitates to the sound from the left. 0: All the reactions observed are correct and immediate on both left and right sides. 55 Personal Ask patient for the location of 3 3: The patient always locates and points to objects on their Domain: Cognition Belongings personal belongings on the patient's right side by fails to locate any objects on the left side. Area/Cluster: Cognition right and 3 on the patient's left. To 2: The patient always locates and points to objects on their be considered a personal belonging right side but fails to locate ⅔ of the objects on the left the item must almost always be kept side. at a certain location by the patient. 1: The patient always locates and points to the objects on Do not hide or arrange the objects for their right side but fails to locate and point to ⅓ of the the patient to find, preferred locations objects on the left side. should be determined by the patient. 0: The patients does not hesitate to locate and point to In asking for the object location, objects on the right and left side. phrase the question “I can't find x, can you tell me where it is?”. Observe how the patient looks around to locate the object and explore their environment 56 Dressing Ask patient to put on an open front 3: The patient only attempts to dress the right arm and Domain: Cognition shirt or coat. Look for differences in completely ignores the left, making no attempt to put the Area/Cluster: Cognition performance on the left and right left arm through the sleeve, and odes not acknowledge a sides of the body. need for help. 2: The patient does not acknowledge a need for help. They start by putting their right arm in the sleeve and continue the left arm. However, they spend significantly less time in dressing their left arm, and the shirt is very messy on the left side. In the end the performance on the left is incomplete and ineffective. 1: The patient does not acknowledge the need for help. They may first attend to their right side, putting their right arm in the sleeve and eventually with some hesitation, work the left arm into its sleeve as well. In the end, the patient is able to put on the shirt, but the left side is not completely pulled down or does not appear as nicely as the right side. The patient does not acknowledge a need for help. 0: The patient asks for help with the left side of the body, and is paying attention to his/her left arm by trying hard to complete the task on the left side. Assessment measures awareness of disability and so a 0 may be given to a patient who cannot complete the task but asks for help as it indicates they are not neglecting the left arm. 57 Grooming Ask the patient to perform 3 3: In all three tasks, the patient only pays attention to the Domain: Cognition grooming tasks. right and always ignores the left side. Area/Cluster: Cognition 2: The patient always takes care of the right side first, and miss the left side in at least one of the tasks. 1: The patient completes all three tasks in a satisfactory manner. They always take care of the right side first, and spend significantly shorter time and put in less effort on the left side. 0: The patient completes all three tasks with no apparent left side asymmetry. 58 Distraction Observe the patient for short attention 1 = absent: the behavior is not present. Domain: Cognition span, easy distractibility, and 2 = present to a slight degree: the behavior is present but Area/Cluster: Cognition inability to concentrate. does not prevent 59 Impulsiveness Observe the patient for indications of the conduct of other, contextually appropriate behavior. Domain: Cognition impulsiveness, impatience, and low (The individual Area/Cluster: Cognition tolerance for pain or frustration. may redirect spontaneously, or the continuation of the 60 Cooperation Observe the patient for uncooperative agitated behavior Domain: Cognition behavior, resistance to care, and does not disrupt appropriate behavior.) Area/Cluster: Cognition demanding behavior. 3 = present to a moderate degree: the individual needs to 61 Pulling Observe the patient for pulling at be redirected from Domain: Cognition tubes, restraints, etc. an agitated to an appropriate behavior, but benefits from Area/Cluster: Cognition 62 Repetition Observe the patient for repetitive such cueing. Domain: Cognition behaviors, motor and/or verbal. 4 = present to an extreme degree: the individual is not able Area/Cluster: Cognition to engage in appropriate behavior due to the interference of the agitated behavior, even when external cueing or redirection is provided. 63 Behavioral Domain: Cognition Observation Area/Cluster: Cognition Profile 64 Pragmatic Domain: Cognition Communi- Area/Cluster: Cognition cation Skills 65 Delayed Domain: Cognition Recall 3 Area/Cluster: Memory Words 66 Rivermead Domain: Cognition Immediate Area/Cluster: Memory Story Retell 67 Rivermead Domain: Cognition Delayed Area/Cluster: Memory Story Retell 68 Basic word Domain: Cognition description Area/Cluster: Communication 69 Commands Domain: Cognition Area/Cluster: Communication 70 Complex Domain: Cognition Ideation Area/Cluster: Communication 71 Word Domain: Cognition repetition Area/Cluster: Communication 72 Sentence Domain: Cognition repetition Area/Cluster: Communication 73 Form Domain: Cognition Area/Cluster: Communication 74 Letter Domain: Cognition choice Area/Cluster: Communication 75 Motor Domain: Cognition facility Area/Cluster: Communication 76 Picture-word Domain: Cognition matching Area/Cluster: Communication 77 Oral word Domain: Cognition reading Area/Cluster: Communication 78 Oral sentence Domain: Cognition reading Area/Cluster: Communication 79 Sentence/para- Domain: Cognition graph Area/Cluster: comprehension Communication 80 Boston Domain: Cognition Naming Test Area/Cluster: Communication 

What is claimed is:
 1. A computer-assisted method for assessing a patient, comprising a computing device carrying out actions comprising: a. selecting a plurality of assessment items for an assessment of the patient in an assessment domain, wherein the assessment domain is one of a self-care domain, a mobility domain, and a cognition domain; b. receiving an input of the assessment of the patient in the one of the self-care domain, the mobility domain, and the cognition domain, wherein the input of the assessment includes data on performance by the patient on the plurality of assessment items; c. using item response theory (IRT) analysis to generate a domain-specific IRT score based on the received input of the assessment, wherein the IRT analysis is based on statistical models constructed for the one of the self-care domain, the mobility domain, and the cognition domain; and d. storing the generated domain-specific IRT score in an electronic medical record of the patient.
 2. The method of claim 1, wherein the plurality of assessment items are selected using IRT analysis.
 3. The method of claim 1, further comprising: retrieving the domain-specific IRT score from the electronic medical record of the patient; and adjusting a treatment plan for the patient based on the retrieved domain-specific IRT score.
 4. The method of claim 1, further comprising: generating a discrimination matrix for the one of the self-care domain, the mobility domain, and the cognition domain.
 5. The method of claim 1, further comprising: generating a difficulty matrix for the one of the self-care domain, the mobility domain, and the cognition domain.
 6. The method of claim 1, wherein the assessment domain is the self-care domain and the plurality of assessment items are in the areas of balance, upper extremity function, and swallowing.
 7. The method of claim 1, wherein the assessment domain is the mobility domain and the plurality of assessment items are in the areas of balance, wheelchair skills, changing body positions, bed mobility, and mobility.
 8. The method of claim 1, wherein the assessment domain is the cognition domain and the plurality of assessment items are in the areas of cognition, memory, and communication.
 9. The method of claim 2, wherein the plurality of assessment items are further selected using factor analysis.
 10. The method of claim 2, wherein the plurality of assessment items are further selected using classical item analysis.
 11. A method of measuring improvements in a rehabilitation patient, comprising a computing device carrying out actions comprising: a. based on item response theory (IRT) analysis performed based on a plurality of statistical models constructed for an assessment domain, selecting assessment items for assessing the patient in the assessment domain, wherein the assessment domain is one of a self-care domain, a mobility domain, and a cognition domain; b. receiving an input of an assessment of the patient in the assessment domain; c. storing the received input in an electronic medical record of the patient; d. determining a domain-specific score based on performance by the patient on the selected assessment items to measure an improvement of the patient in the assessment domain; and e. comparing the domain-specific score against a predicted score for the patient in the corresponding assessment domain, the predicted score based at least in part on information about a diagnosis associated with the patient in the corresponding assessment domain.
 12. The method of claim 11, wherein the assessment domain is the self-care domain and the selected assessment items are in the areas of balance, upper extremity function, and swallowing.
 13. The method of claim 11, wherein the assessment domain is the mobility domain and the selected assessment items are in the areas of balance, wheelchair skills, changing body positions, bed mobility, and mobility.
 14. The method of claim 11, wherein the assessment domain is the cognition domain and the selected assessment items are in the areas of cognition, memory, and communication.
 15. The method of claim 11, wherein the assessment items are further selected based on factor analysis.
 16. The method of claim 11, wherein the assessment items are further selected based on classical item analysis.
 17. The method of claim 11, further comprising: generating, in a format suitable for display on a display screen, a plot comprising the domain-specific score and the predicted score.
 18. The method of claim 11, wherein the comparison is used to i) identify an area of therapeutic need and ii) target the therapy to the area of therapeutic need. 